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Abstract 

We construct the "expected signature matching" estimator for differential 
equations driven by rough paths and we prove its consistency and asymptotic 
normality. We use it to estimate parameters of a diffusion and a fractional 
diffusions, i.e. a differential equation driven by fractional Brownian motion. 
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1 Introduction 

Statistical inference for stochastic processes is a huge field, both in terms of research 
output and importance. In particular, a lot of work has been done in the context of 
diffusions (see [20l [T¥]. [Tj for a general overview and [5] for some recent developments). 
Nevertheless, the problem of statistical inference for diffusions still poses many chal- 
lenges, as for example constructing the Maximum Likelihood Estimator (MLE) for 
the general multi-dimensional diffusion. An alternative method in this case is that 
of the Generalized Moment Matching Estimator (GMME). While, in general, less 
efficient compared to the MLE, the GMME is usually easier to use, more flexible and 
has been successfully applied to general Markov processes (see [8]). 

On the other hand, most methods of statistical inference in the context of non- 
Markovian continuous processes are restricted to models that depend linearly on 
the parameter. In the case of differential equations driven by fractional Brownian 
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motion, some recent results can be found in P, Q], [22] . In pU] , the author discusses 
the problem of parameter estimation for differential equations driven by Volterra 
type processes - which include fractional Brownian motion. In all these papers, 
the analysis is restricted to models that depend linearly on the parameter and for 
parameters appearing in the drift. Finally, for non-Markovian processes coming from 
stochastic delay equations, see [T3| I2T]. 

The theory of rough paths provides a general framework for making sense of differ- 
ential equations driven by any type of noise modelled as a rough path - this includes 
diffusions, differential equations driven by fractional Brownian motion, delay equa- 
tions and even delay equation driven by fractional Brownian motion (see [IS])- The 
basic ideas have been developed in the '90s (see p2] and references within). However, 
the problem of statistical inference for differential equations driven by rough paths 
has not been addressed yet. This is exactly what we strive to do in this paper. 

The exact setting of the statistical problem we consider is the following: we ob- 
serve many independent copies of specific iterated integrals of the response 
{Yt, < t < T} of a differential equation 

dY t = f(Y t ;9)-dX t , Y = y 

driven by the rough path X. We will formally define what we mean by a rough 




path and a differential equation driven by it in section |2.1[ Two examples of interest 
are X t = (t, W t ) where W t is Brownian motion and the differential equation is a 
Stratonovic stochastic differential equation and X t = (t, Bf 1 ) where Bf is fractional 
Brownian motion. The iterated integrals are observed at a fixed time T. 
However, if the response lives in more than one dimension, the iterated integrals will 
be functions of the whole path. For example, suppose that Y t = (Y t {1) ,Y t {2) ) and we 
observe 

< 2) < X) 

0<U!<u 2 <T 

for fixed time T. We further assume that the vector field f(y;9) is polynomial 
in y and depends on the unknown parameter 9. Finally, we assume that we 
know the expected signature of the rough path X on the interval [0, T\. Again, 
the signature of a rough path will be formally defined later. For now, let's just say 
that it is the set of all iterated integrals of X and its expectation fully describes the 
distribution of the rough path X. 

The first assumption is a bit unusual: it is much more common to assume that we 
observe one long path rather than many short ones. This setting is chosen for two 
reasons. The first is its simplicity: we develop here some basic tools for statistical 
inference of differential equation driven by rough paths. These can be generalized to 
other settings, such as observing one continuous path, provided that some ergodicity 
conditions are fulfilled. However, there are no general results on the ergodicity of 
differential equations driven by rough paths and ergodicity has to be checked for each 
case separately. For example, see [23] for some recent results on the ergodicity of 
differential equations driven by fractional Brownian motion. 

The second reason was that such settings arise in the context of "equation-free" 
medelling of multiscale models (see [H]). Suppose that we have access to some 
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code that simulates the dynamics of a complex system, such as molecular dynamics. 
We treat the code as a "black box". We are interested in the global behavior of a 
function of our system that "lives" in the slow scale, i.e. in some limit its dynamics 
follow a diffusion, which is, however, unknown. The basic idea of "equation-free" 
modelling is to run the code for a short time and use the output to locally estimate 
the parameters of the differential equation. This process is repeated several times with 
carefullly chosen initial conditions, so as to get an estimate of the global dynamics. 
To summarize, in this problem: 

(a) we observe many independent paths; 

(b) time is short; 

(c) we locally approximate the vector field by a polynomial. 

Currently, the estimation is done using the MLE approach, pretending that the data 
comes from the diffusion rather than the multiscale model (see [2]). However, for short 
time T we cannot expect the diffusion approximation to be a good one. We believe 
that in the scale of T, we can always approximate the dynamics by a differential 
equation driven by a rough path (see [19]). 

The structure of the paper is the following: we start by reviewing some basic 
concepts and results from the theory of rough paths and we give a precise description 
of the problem we consider. In section 3, we describe the methodology. The idea is 
simple: we want to match the theoretical and the expected signatures of the response. 
However, in general we cannot expect to get an explicit formula for the theoretical 
expected signature, so we construct an approximation of it. We go on to give a precise 
definition of the "expected signature matching estimator" using this approximation 
and prove its consistency and asymptotic normality. 

In section 4, we apply the method to two examples that represent the most common 
RDEs: diffusions and differential equations driven by fractional Brownian motion. We 
have written a package in Mathematica (available upon requext) that can be used to 
recreate the examples we include in the paper or try out new ones. 

2 Setting 

2.1 Some basic results from the theory of Rough Paths 

In this section, we review some of the basic results from the theory of rough paths. 
For more details, see [16] and references within. The goal of this theory is to give 
meaning to the differential equation 

dY t = f(Y t ) ■ dX t , Y = y . (1) 

for very general continuous paths X. More specifically, we think of X and Y as paths 
on a Euclidean space: X : / ->■ R n and Y : I -»■ R rn for I : = [0,T], so X t G R n and 
Y t G R rn for each t G /. Also, / : R m L(R n ,R m ), where L(M n ,M m ) is the space of 
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linear functions from R n to IR m which is isomorphic to the space ofmxn matrices. 
For the sake of simplicity, we will assume that f(y) is a polynomial in y - however, 
the theory holds for more general /. The path X is any path of finite p- variation, 
meaning that 

sup \\X te - X tt _ x \ v < oo, 

T>C[0,T\ \ t J 

where T> = {te}e goes through all possible partitions of [0, T] and || • || is the Euclidean 
norm. Note that we will later define finite p-variation for multiplicative functionals, 
also to be defined later. 

The fact the X is allowed to have any finite p-variation is exactly what makes 
this theory so general: Brownian motion is an example of a path that has finite p- 
variation for any p > 2 while fractional Brownian motion with Hurst index h has 
finite p variation for p > r. We will define fractional Brownian motion formally in 
the corresponding example - for now, let us just say that it is Gaussian, self-similar 
but not Markovian except for h = 1/2 when it coincides with Brownian motion. 

When p G [1, 2), we say that Y is a solution of ([!]) if 

-t 



Y t = Y s + [ f(Y u ) ■ dX u , V(s,f) G A T , 

J s 



where A T := {(s,t); < s < t < T}. In this case, the integral is defined as the 
Young integral (see p2]). What does it mean for Y to be a solution of ([!]) when 
p > 2? In order to answer this question, we first need to define the integral. To make 
this task possible, we re-write the integral so that the integrand is a function of the 
integrator: Set f yo (-) := /(• + y ). Define h : R n © R m ->• End(M n © R rn ) by 

H{x ?/) ( ^ nxn Onxm j 

Instead of defining f* f(Y u ) ■ dX u , we will define the integral 

£h(Z u )-dZ u , V(M)eA r , (3) 

where Z = (X, Y). Note that if / is a polynomial in y, then h will also be a polynomial 
in z. More generally, we will define this integral for any path Z in IR^ 1 of finite 
p- variation and any polynomial h : R £l — > L(R £l ,R i2 ) of degree q. Since h is a 
polynomial, its Taylor expansion will be a finite sum: 

k=0 

where h = h and h k : l^ 1 -> L(M^ 0fc , L(R e \ R i2 )) and for all z G R e \ h k (z) is a 
symmetric k- linear mapping from IR^ 1 to L (IR^ 1 , IR^ 2 ) , for k > 1 . 
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Suppose that Z is a path of bounded variation (i.e. p — 1). Then, using the 
symmetry of hk(z) and the "shuffle product property", we can write 

<? 

h(Z u ) = J2h k (Z s )Z k s , u , V( S) «)6A r 



k=0 



where for every (s,t) G At, 



Z° = 1 G R and Zj t = \ ... / dZ$K..dZ$*)> Gi <l( 



6 JtT- 

J<Ul<— <Ufc<t J (i 1 ,...,i k )e{l,...,n} k 

More specifically, we use the notation 



Z%>~M:= f ... [ dZ^...dZ^. 
The "shuffle product property" says that for any (s, u) G At and any "words" a\, 02 G 



|J fc > {l, . . . ,£i} k , we can write 

where cxi U o"2 is the shuffle product between the words ci and 02, i.e. it is the 
set of all words (with repetition) that we can create by mixing up the letters of 
Ox and 02 without changing the order of letters within each word. For example, 
(1, 2) U (2) = {(1, 2, 2), (1, 2, 2), (2, 1, 2)} (see [16]). This generalizes the "integration 
by parts" formula. Then, for all (s, t) G At, 



/ /i(Z n )dZ u = ^/i fc (Z s )Z 



k+l 
s.t 



Example 1. Let us demonstrate what we have said so far with an example. Consider 
the ordinary differential equation 

dY t = Y t dt + (Y t 2 + l)de\ Y = 0. 

Then, X t = (t, e*) is a path in R 2 , Y t eR and f(y) = (y, y 2 + 1) G L(M 2 , R), which is 
polynomial of degree 2. In this case, X is of bounded variation and p — 1. Following 
what we just mentioned, instead of defining the integral 

f f(Y u )dX u = f (Y u du + (Y 2 + l)de u ) 

J S J S 

directly, we set Z t = (X t , Y t )' = (t, e\ Y t )' G R 3 and 



h{Z t ) = I 

Z (3) (^f))2 + 1 
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where = Y t is the projection of Z t to the third dimension. Then, the integral 



f h(Z u )dZ u becomes 

jT h(Z u )dZ u = ( 0, 0, jf /(y tt )dX u ) , 

so, defining J* h(Z u )dZ u is equivalent to defining J 1 f(Y u )dX u . We now proceed to 
writing the integral as a linear combination of iterated integrals of Z , using the fact 
that h is a quadratic polynomial. We define hk as 

h {z) = h(z), hi(z) = {dih(z)} 3 i=1 , h 2 (z) = {d ilM h(z)}^ 1>i2=1 . 

Also, we note that 

((z 2 - Zi r) t = zf - z? and ((z 2 - z^)^ = (z^ - z^)(z^ - 

and thus the sum Ylt=o hk(zi) ^-%p— becomes 





( 




h 




: ) 




\ 


o(4 3) -4 3) ) 2 J 



^ + (°PV + i) + ((i + 2^ 3) )(4 3) -4 3) ) 

which is equal to h(z 2 ). It is easy to see that for all < s < t < T , 

rt I (3) J3)\2 rt rm 

(4 3) - ^ 3) ) = I d4 S) and ( ^ ~ 2 Zs ] =JJ s dz%dz$. 

Thus, using the notation of the iterated integral, we write 

h(z u ) = h(z s ) + d 3 h(z s )Z^ + % fl h(z a )Z<M 
and if we integrate once more we get 

jf h(z u )du = h(z s )Z { $ + d 3 h(z s )zif ) + dl,h{z s )Z^\ 

Note that in the above example, we did not use the shuffle product formula because 
m — 1 (Y t G R). If the response Y lives in more that one dimensions, then the shuffle 
product formula is used, for example, to say that 

\(z t - z s )^(z t - z.)W = \Z^Z^ = Zjj*' + Z%*\ 

Below we give a concrete example to show how the shuffle product formula extends 
integration by parts. 
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Example 2. Let us give here an example of the shuffle product. Let z t be a smooth 
path in W 11 for some m > 1. Then, for any pair i±, i 2 G {1, . . . , m} using the integra- 
tion by parts formula, we get 

Z^ 2) = / / dzMdzS* = / (4° - z™)dz™ = 

J s J s J s 

= f^dz^-z^{z^-z^) = 

\Jh)Ji2)]t _ f Aii) j Six) _ (h) [ (h) _ (i 2 )\ _ 

= 4 l2) - zM) - J\^dz^ = 

= (4 l2) - z&>) (4 n) - #>) - jf (#> - z^)dzt ] = 

_ 7 {h) 7 {ii) 7(12,11) 

which is in agreement with the shuffle product formula, since the shuffle product of 
two letters is {%i) U (i 2 ) = {(ii,^), (*2,*i)}- 

It is now clear that in order to extend this construction to any path Z of finite 
p-variation, where p > 2, we will first need to define their iterated integrals Z* t . 
These are not necessarily unique (for example, if Z is Brownian motion, then Ito and 
Stratonovic gave two different definitions for the integral). Then, we will need to find 
those integrals that respect the "shuffle product property" . Before going any further, 
we need to give some definitions: 

Definition 2.1. Let A T := {(s,t); < s < t < T}. Let p > 1 be a real number. We 
denote by T^iR 11 ) the k th truncated tensor algebra 

T {k \R £l ) := M © R £l © M' 1 ® 2 © • • • © R £l ® k . 

(1) Let Z : A T ->■ TW(R €l ) be a continuous map. For each (s, t) G denote by 
Z Stt the image of (s, t) through Z and write 

Z s , t =(Z° t ,Zi t ,..., Z y G T< fc >(R*), where Zj >t ={zjr^f . ■ 

i. j ii,...,ij=± 

The function Z is called a multiplicative functional of degree k in WL £l if 
Z° s t = 1 for all (s, t) G A T and 

Z SjU © Z Ujt = Z S;t Vs, u, t satisfying 0<s<u<t<T, 
i.e. for every (ii, . . . G {1, . . . , and I — 1, ... ,k: 

(z. >u ®z^ (ii, -" A) = E z S^ )z S +1, "' A) 

3=0 

This is called Chen's identity. 
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(2) A p-rough path Z in M. £l is a multiplicative functional of degree [p\ in 
that has finite p-variation, i.e. Vi = 1, . . ., \_p\ and (s,t) G At, it satisfies 

||x ^ < {M(t-s))p 



P , 

where \\ ■ \\ is the Euclidean norm in the appropriate dimension and (3 a real 
number depending only on p and M is a fixed constant. The space of p-rough 
paths in IR^ 1 is denoted by f2 p (IR £l ). 

(3) A geometric p-rough path is a p-rough path that can be expressed as a limit 
of 1-rough paths in the p-variation distance d p , defined as follows: for any X, Y 
continuous functions from A T to T^-pJ) (]R £l ) 7 

dp(X,Y)= max sup ( £ ||X^ - Yj^jf 
i<«<LpJ dc[o,t] 

where T> = {tf }i goes through all possible partitions of [0, T]. T/ie space of 
geometric p-rough paths in W 1 is denoted by GQ p (M> £l ). 

One of the main results of the theory of rough paths is the following, called the 
"extension theorem" : 

Theorem 2.2 (Theorem 3.7, [16J). Let p > 1 be a real number and k > 1 be an 
integer. Let X : Ax — > T^(IR n ) be a multiplicative functional with finite p-variation. 
Assume that k > [p\. Then there exists a unique extension of X to a multiplicative 
functional X : A T T^ k+1 \W n ). 

Let X : [0, T] — > M. n be an n-dimensional path of finite p-variation for n > 1. One 
way of constructing a p-rough path is by considering the set of all iterated integrals 
of degree up to [p\ . UX t = . . . , x[ n) ^j , we define X : A T T^ as follows: 

X° = 1 G R and X* ( ={/•••/ dX u? ■ ■ ■ dX u h k) \ € 

U J s<U!<-<u k <t ) (i 1 ,...,i fc )e{l,...,n} fc 

for k = 1,..., [pj- Note that Chen's identity is an identity all iterated integrals 
satisfy. For example, for word (ii,^) Chen's identity says that 

(hM) _ fry \(il,h) j_ fry fry \ (la) , /y \{hM) 



[Z s , t )^> = (Z SjU )^> + (Z StU )^> (Z u>t )™ + (Z u 



This follows by breaking the domain of integration {u\,U2 '■ s < u\ < U2 < t} 
into three domains {ui,u 2 : s < U\ < u 2 < u}, {ui,u 2 : u < Ui < u 2 < t} and 
{ui, u 2 : s < U\ < u and u < u 2 < t}. 

When p G [1,2), the iterated integrals are uniquely defined as Young integrals. 
However, as we already mentioned, when p > 2 there will be more than one way 
of defining them. What the extension theorem says is that if the path has finite 
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p-variation and we define the first \_p\ iterated integrals, the rest will be uniquely 
defined. So, if the path is of bounded variation (p = 1) we only need to know its 
increments, while for an n- dimensional Brownian path, we need to define the second 
iterated integrals by specifying the rules on how to construct them. In general, we 
can think of a p-rough path as a path X : [0,T] — > lR n of finite p-variation, together 
with a set of rules on how to define the first |_pj iterated integrals. Once we know 
how to construct the first \p\ , we know how to construct all of them. 

Definition 2.3. Let X : [0,T] — > 1R™ be a path. The set of all iterated integrals is 
called the signature of the path and is denoted by S(X). 

We can now proceed to define the integral ^ when Z is a path of finite p- variation 
with p > 2. First, it is clear that in order for the integral to be uniquely defined, we 
should define the first |_pj iterated integrals, so we define the integral not with respect 
to Z but a corresponding p-rough path Z. To extend the previous construction, we 
also need that Z satisfies the "shuffle product property". It is not hard to see that 
geometric p-rough paths do satisfy this property since they are limits of paths of 
bounded variation and for paths of bounded variation the property follows from the 
usual integration by parts formula (see also [H]). So, we will define J h(Z)dZ, where 
Z is a geometric p-rough path in M. 1 , i.e. Z G GQ p (M> £l ). 

By definition, there exists a sequence Z(r) G f2i(IR £l ) such that d p (Z(r), Z) — > as 
r — > oo. Then, for each r > 0, we define Z(r) := J /i(Z(r))dZ(r). These are also a 
1-rough paths in IR^ 2 and thus, their higher iterated integrals are uniquely defined. In 
addition, it is possible to show that the map J h : fii(IR £l ) — > Qi(M. e2 ) sending Z(r) 
to Z(r) is continuous in the p- variation topology. 

We define Z := J h(Z)dZ as the limit of the Z(r) with respect to d p - this is will 
also be a geometric p-rough path. In other words, the continuous map J h can be 
extended to a continuous map from GQ p (M, £l ) to GQ P (M> £ ' 2 ), which are the closures of 
and ^(M^ 2 ) respectively (see Theorem 4.12, p3]). 

Note that this construction of the integral can be extended for any h G Lip (7 — 1) 
for 7 > p (see [T6]). 

Remark 2.4. We say that a sequence Z(r) of p-rough paths converges to a p-rough 
path Z in p-variation topology if there exists an M G R and a sequence a(r) converging 
to zero when r — > 00, such that 

||Z(r);j, ||Zy I <(M(t- S ))i, and 
||Z(r)^-Zi ft ||<o(r)(M(t- S ))i 

for % = 1, . . . , [pj and (s,t) G Ay. Note that this is not exactly equivalent to conver- 
gence in d p : while convergence in d p implies convergence in the p-variation topology, 
the opposite is not true. Convergence in the p-variation topology implies that there is 
a subsequence that converges in d p . 

We can now give the precise meaning of the solution of Q, when driven not by a 
path X but a geometric p-rough path X: 
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Definition 2.5. Consider X G GQ p {R n ) and y G R m . Set f yo (-) := /(• + y ) and 
define h : R n © R m -> End(M n © M m ) as zn j|). We ca/Z Z G Gtt p (R n © M m ) a 
solution of lip «/ £/ie following two conditions hold: 



(i) Z = Jh(Z)dZ. 

(ii) 7TRn(Z) = X, where by 7t~gn we denote the projection of Z to R n . 

As in the case of ordinary differential equations (p — 1), it is possible to construct 
the solution using Picard iterations: we define Z(0) := (X, e), where by e we denote 
the trivial rough path e = (1, O^n, R «<g>2, . . . ). Then, for every r > 1, we define 
Z(r) = J h(Z(r — l))dZ(r — 1). The following theorem, called the "Universal Limit 
Theorem", gives the conditions for the existence and uniqueness of the solution to 
Q. The theorem holds for any / G Lip (7) for 7 > p but we will assume that / is a 
polynomial. The proof is based on the convergence of the Picard iterations. 

Theorem 2.6 (Theorem 5.3, [IS]). Let p > 1. For allX G GQ p (R n ) and ally G R m , 
equation pi) admits a unique solution Z = (X, Y) G Gfi p (IR n © IR m ), in the sense of 
definition \2.5\ This solution depends continuously on X and yo and the mapping 
If : Gfi p (IR n ) — > GQ p (R m ) which sends (X, yo) to Y is continuous in the p-variation 
topology. 

The rough path Y is the limit of the sequence Y(r), where Y(r) is the projection of 
the r th Picard iteration Z(r) to IR m . For all p > 1, there exists T p G (0,T] such that 

\\nr)U - Y(r + 1)1,11 < , %s,t) e A Tp , V< = 0,...,bJ- 

/3 



T/ie constant T p depends only on f and p. 
2.2 The problem 

We now describe the problem that we are going to study in the rest of the paper. Let 
(Q, J 7 , P) be a probability space and X : Q — > GQ p (R n ) a random variable, taking 
values in the space of geometric p-rough paths endowed with the p-variation topology. 
For each u> G f2, the rough path X(w) drives the following differential equation 

dY t (u) = f(Y t (u) ] 6)-dX t (u), Y = y (5) 

where 6 G O C R d , O being the parameter space and for each 9 G @. As before, 
/:l m x84 L(M n ,M m ) and f g (y) := f(y; 9) is a polynomial in y for each 9 G 0. 
According to theorem 2.6 we can think of equation ^ as a map 



: GiyiT) Gn p (M m ), (6) 

sending a geometric p-rough path X to a geometric p-rough path Y and is continuous 
with respect to the p-variation topology. Consequently, 

Y := I f n oX:(l4 GVl p (R m ) 
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is also a random variable, taking values in Gf2 p (IR m ) and if P T is the distribution of 
X t, the distribution of Y r will be 



T e=V T oIJ H \, (7) 



fe,yo 

Suppose that we know the expected signature of X at [0, T], i.e. we know 
E (x^) : = E ( /■■■ / dXM.-.dXS**), 

v ' \J J0<u l <---<u k <T J 

for all ij G {1, . . . , n} where j = 1, . . . , k and k > 1. Our goal will be to estimate 9, 
given several realizations of Y 0j t, i.e. {Y T (wj)}^ 1 . 

3 Method 

In order to estimate 9, we are going to use a method that is similar to the "Method 
of Moments". The idea is simple: we will try to (partially) match the empirical 
expected signature of the observed p-rough path with the theoretical one, which is a 
function of the unknown parameters. Remember that the data we have available is 



several realizations of the p-rough path Yo,t described in section 2.2 To make this 
more precise, let us introduce some notation: let 

E T (9):=E e (Yl T ) (8) 

be the theoretical expected signature corresponding to parameter value 9 and word r 
and 

1 - 

M ^ := ^E Y o,t(^) (9) 

i=l 

be the empirical expected signature, which is a Monte Carlo approximation of the 
actual one. The word r is constructed from the alphabet {1, . . . , m}, i.e. r e W m 
where W m := Ufc>ci{lj • • • 5 m } k - The idea is to find 9 such that 

E T (9) = M T m VreVcW m 
for some choice of a set of words V. Then, 9 will be our estimate. 

Remark 3.1. When m = 1, the expected signature o/Y is equivalent to its moments, 
since 

m 

v (l, . . . , 1) _ yr N m 

When m = 2, one example is to consider the word t = (1,2). Then, one needs to 
compute the iterated integral (or an approximation of, if the path is descretly observed) 

T ps 



J 



Y§$\uh) = / / dYPMdYVfa 
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for each path Y t (u)i) = ^V/^fWj), Y^^jj , for i = 1, . . . , N. Then 



< 2) ^^E Y i?w- 

i=l 



iVoie t/iat i/ws is closely related to the correlation of the two one-dimensional paths 
{Yt }te[o,T] and {YJ; } te [ 0) r] since, by the shuffle product, 

Yq£ (ui) + Y^\ui) = Y * T (u>i)Y ^(a;i) 
and by the law of large numbers, 

iim (m£* + m^ ] ) = e ((y« - r «)(y« - r (2) )) . 

Several questions arise: 

(i) How can we get an analytic expression for E T {9) as a function of 91 

(ii) What is a good choice for or, for m = 1, how do we choose which moments 
to match? 

(hi) How good is 9 as an estimate? 

We will try to answer these questions below. 

3.1 Computing the Theoretical Expected Signature 

We want to get an analytic expression for the expected signature of the p-rough path 
Y at (0, T), where Y is the solution of (|5j in the sense described above. In other 
words, we want to compute Q. We are given the expected signature of the p-rough 
path X which is driving the equation, again at (0,T), i.e. we are given 

E(Xy, Vct G {1, . . . , n} k , fceN. 

In addition, we know the vector field fe(y) = f(y; 9) in (|5|, up to parameter 9 and 
we know that it is polynomial. 

It turns out that we cannot compute Q8j) , in general. We need to make one more 
approximation since the solution Y will not usually be available: we will approxi- 
mate the solution by the r th Picard iteration Y(r), described in the Universal Limit 



Theorem (Theorem 2.6). Finally, we will approximate the expected signature of the 
solution corresponding to a word r, E T (9), by the expected signature of the r th Picard 
iteration at r, which we will denote by E^{9): 

E;(9):= E e (Y(r)l T ). (10) 

The good news is that when fg is a polynomial of degree q on y, for any ? 6 N, 
the r th Picard iteration of the solution is a linear combination of iterated integrals 
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of the driving force X. More specifically, for any realization u and any time interval 
(s, t) G At, we can write: 



Y(r); t = E ^ a (y ,s;9)Xl t , (11) 



M<M*£r 



where a^. a (y] 6) is a polynomial in y of degree g r and | • | gives the length of a word. 
Thus, 

E;(6)= Yl <feo,^)E(xa, (12) 



We will prove (11), first for p = 1 and then for any p > 1 by taking limits with 



respect to d p . We will need the following lemma. 

Lemma 3.2. Suppose that X 6 GT^M"), Y G Gf2 1 (IR m ) and is possible to write 
Y$ = E «^(i/ s )X^, V(s, *) G A T and Vj = 1, . . . , m (13) 

where ai j) : M m -> L(R,R) a polynomial of degree q with g, (fr, g 2 G N and gi > 1. 
Then, 

YJ, t = E a£(y.)X* t , (14) 

0-eW„, |-rkl<k|<|r|g 2 

/or a// (s, £) G Ay and r G W m . a£ : lR m — > L(IR, R) are polynomials of degree < q\r\. 



Proof. We will prove (14) by induction on |r|, i.e. the length of the word. By 
hypothesis, it is true when \t\ = 1. Suppose that it is true for any r G W m such that 
\t\ = k > 1. First, note that from (fl3l, we get that 



dY^= E ^(yJXEW, v«G[ s ,t] 

treWn, <3i<M<<22 

where a— is the word er without the last letter and o> is the last letter. For example, 
if cr = («!, . . . ,i b _i,i b ), then o— = (i 1 , . . . ,ib-i) and at = i b . Note that this cannot 
be defined when a is the empty word (6 = 0). Now suppose that \r\ = k + 1, so 
T" = (ji, • • • ,jh,jk+i) for some ji, . . . ,j k+1 G {1, . . . ,m}. Then 



S \A;qi<|(Ti|<fcg2 / Ql<|o-2|<92 

E Kr(^)4f jf x- x--rfx: 2 ' 



kq 1 <\a 1 \<kq 2 , <3l<|cr 2 |<g2 
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Now we use the fact that for any geometric rough path X and any (s, u) G At, we 
can write 

X^X^u = ^2 Xg )tt , (15) 

CTS(TlU(tT2 — ) 



where o\ U {&%—) is the shuffle product between the words o\ and a 2 — • Applying (15) 
above, we get 

<r£W n , (k+l) qi <\a\<(k+l)q 2 

where 

<* T M= J2 <(ys)<(y s ) 

(<TiU(T2— )3a— , u e =U2 l 

is a polynomial of degree < kq + q = (k + l)q. Note that the above sum is over all 
0"i, cr 2 G such that kq\ < \a\\ < kq 2 and qi < \<Ji\ < q 2 □ 



We now prove (11) for p = 1 . 

Lemma 3.3. Suppose that X G GT^M™) is driving system |Ip ; where f : lR m — > 
L(IR n ,IR m ') a polynomial of degree q. Let Y(r) be the projection of the r th Picard 
iteration Z(r) to W 71 , as described above. Then, Y(r) G GT^IR" 1 ) and it satisfies 

Y(r); t = £ < )CT (y , S )X^, (16) 

/or a// (s,t) G and r G W m . a T rcT (y, s) is a polynomial of degree < \r\q r in y. 

Proof. For every r > 0, Z(r) G Gfii(M n+m ) since Z(0) := (X,e), X G Gfii(M n ) and 
integrals preserve the roughness of the integrator. So, Y(r) G Gf2i(IR' m ). We will 
prove the claim by induction on r. 

For r = 0, Y(0) = e and thus (16) becomes 

Y (°Y s ,t = ^o,${yo,s) 

and it is true for a® = 1 and = for every r G W m such that |r| > 0. 

Now suppose it is true for some r > 0. Remember that Z(r) = (X, Y(r)) and that 
Z(r + 1) is defined by 

Z(r + 1) = y fe(Z(r))dZ(r) 

where h is defined in ^ and f yo (y) = f(yo + ?/)• Since / is a polynomial of degree q, 
h is also a polynomial of degree q and, thus, it is possible to write 

h(z 2 ) = J2 h k(zi) 1 fc ; » V^eR*, (17) 

fe=0 

where £ = n + m. Then, the integral is defined to be 



Z(r 



+ l) ->4 := / MZ(r))rfZ(r) = ^/ ifc (Z(r) s )Z(r)^ 1 V( S ,t)GA T . 
Js k=0 
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Let's take a closer look at functions h k :R l ->L (R e ® k , L(R e , R £ ) ) . Since (17) is the 

Taylor expansion for polynomial h, hk is the k th derivative of h. So, for every word 
/3 eW e such that \0\ = k and every z = (x,y) E R e , {h h {z)f = d^h(z) E L(R E ,R e ). 
By definition, h is independent of x and thus the derivative will always be zero if 
contains any letters in {1, . . . , n}. 

Remember that Y(r + 1) is the projection of Z(r + 1) onto 1R" 1 . So, for each 
j E {l,...,m}, 

g 



V » l)g = Z(r + 1)W> = J2 (hk (Z(r) s ) Z{vf s f) 

k=0 



\ (n+j) 



= E E d T+n h n+hl {Z{r) s )Z{vt; nA 

1=1 T&Wm(0,q) 

n 

= E E drhdyo + Y(r) s )Y(r)^\ (18) 

i=l reW m (0,g) 

where W m (ki, k 2 ) = {t E W m ; k\ < \r\ < k 2 } for any k\,k 2 E N, i.e. it is the set 
of all words of length between k\ and k 2 . By the induction hypothesis, we know that 
for every r E W m , 

Z(r);t n = Y(r)l )t = Yl <,(yo,s)Xl t 
W\<\t\^Et 

and thus, for every i — 1, . . . , n, 

m% +n ' l) = E ayiMxjf. (19) 



Putting this back to the equation above, we get 

n 

Y(r + l)^ = E E ^/i.* fa> + E o^GM*^ 

and by re-organizing the sums, we get 

Y(r + l)g= £ «Sv(yo,«)X: it , (20) 

M<<^+1=^ 

where = and for every a E W n — 0, 

a r+iAy^ s )= E 5 r/j, CT ,(z/o + ^(^) s )a r V(z/o,s)- 

If a T ra are polynomials of degree < |T|g r , then atr,l are polynomials of degree < q r . 

The result follow by applying lemma 3.2 Notice that (in the notation of lemma 3.2) 

(?) — — 

qi > 1 since at^ lj$ = 0. □ 
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We will now prove ( 11 ) for any p > 1 



Theorem 3.4. The result of lemma 3.3 still holds when X G Gfi p (IR n ) ; /or ant/p > 1. 



Proof. Since X G GT^M"), there exists a sequence {X(k)}fc> in GT2i(R n ), such that 



X(k) X in the ^variation topology. We denote by Z(k, r) and Z(r) the r 
Picard iteration corresponding to equation driven by X(k) and X respectively. 

First, we show that Z(k, r) Z(r) and consequently Y(k, r) Y(r) in the 
p- variation topology, for every r > 0. It is clearly true for r = 0. Now suppose that 
it is true for some r > 0. By definition, Z(r + 1) = / /i(Z(r))tZZ(r). Remember 
that the integral is defined as the limit in the p-variation topology of the integrals 
corresponding to a sequence of 1-rough paths that converge to Z(r) in the p- variation 
topology. By the induction hypothesis, this sequence can be Z(k, r). It follows that 
Z(k, r + 1) = J /i(Z(k, r))cfZ(k, r) converges to Z(r + 1), which proves the claim. 
Convergence of the rough paths in p-variation topology implies convergence of each 
of the iterated integrals, i.e. 

Y(k,r);/^Y(r); t 

for all r > 0, (s, t) G A T and r G W m . 

By lemma 3.3| since X(k) G GT2i(M") for every k > 1, we can write 



th 



Y(k,r)i,= Yl <Ayo,sm^:,v 

; r -i 

9-1 



for every r G W m , (s,t) G and k > 1. Since X(k) fc ^>° X in the p- variation 
topology and the sum is finite, it follows that 



The statement of the theorem follows. □ 



3.2 The Expected Signature Matching Estimator 

We can now give a precise definition of the estimator, which we will formally call 
the Expected Signature Matching Estimator (ESME): suppose that we are in 



the setting of the problem described in section 2.2 and MJ^ and E^(9) are defined 



as in (|9| and (10) respectively, for every r G W m . Let V C W m be a set of d words 
constructed from the alphabet {1, . . . , m}. For each such V, we define the ESME v£ N 
as the solution to 

E;{9) = M T N , Vr G V. (21) 



This definition requires that (21) has a unique solution. This will not be true in 
general. Let V r be the set of all V containing d words, such that EJ{9) = M, Vr G V 
has a unique solution for all M G S T C R where SV is the set of all possible values of 
M^j, for any N > 1. We will assume the following: 
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Assumption 1 (Observability). The set V r is non-empty and known (at least up to 
a non-empty subset). 

Then, 9^ N can be denned for every V G V r . 

Remark 3.5. In order to achieve uniqueness of the estimator, we might need some 
extra information that we could get by looking at time correlations. We can fit this 
into our framework by considering scaled versions of (f3D together with the original 
one: for example consider the equation 

dY t (u) = f(Y t (u);9)-dX t (u),Y = y 
dY(c) t {uj) = f(Y(c) t (u);6)-dX ct {u),Y(c) = y 

for some appropriate constant c. Then, Y(c) t = Y ct and the expected signature at 
[0, T] will also contain information about E {y^Y^\ for any jiiji = 1, wi- 



lt is very difficult to say anything about the solutions of system (21), as it is very 
general. However, if we assume that / is also a polynomial in 9, then (21 ) becomes a 
system of polynomial equations. 

Note that one can also create a Generalized Expected Signature Matching Estima- 
tor as the solution of 

P a (E T r (9)) = P a (M T N ) , for a G A 

where P a are polynomials of (empirical or theoretical) expected values of iterated 
integrals corresponding to words r and A an appropriate index set. 

Remark 3.6. In the case where y t is a Markov process, the Generalized Moment 
Matching Estimator can be seen as a special case of the Generalized Expected Signature 
Matching Estimator. In that case, the question of identifiability has been studied in 
detail (see J&jj), but without considering the extra approximation of the theoretical 
moments by Picard iteration. 

3.3 Properties of the ESME 



It is possible to show that the ESME defined as the solution of (21) will converge to 
the true value of the parameter and will be asymptotically normal. More precisely, 
the following holds: 

Theorem 3.7. Let 9^ N be the Expected Signature Matching Estimator for the system 



described in section 2.2 and V G V r . Assume that the expected signature of Y 0i t 
is finite and that f(y; 6) is a polynomial of degree q with respect to y and twice dif- 
ferentiable with respect to 9. Let 9q be the 'true' parameter value, meaning that the 
distribution of the observed signature Yo 5 t is Qj , defined in |?|). Set 

D^(9) hT = —E T r (9) and E v {9 ) T y = cov (YJ )T , Y£ r ) (22) 
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and assume that inf r> o ) 6»ee 11-^^(^)11 > 0, i.e. D\ (9) is uniformly non- degenerate with 
respect to r and 9. Then, for r oc log N and T are sufficiently small, 



®rN ~^ @o, with probability 1, 



and 



as N oo, where 



v\Po) 



9^ N -9 ) A A/" (0,7) 



vl/2 



Proof. By theorem 3.4 and the definition of E^(9), 

Oo;0)E(X* T ) 



(23) 
(24) 
(25) 



3.2 



and 



3.3 



Since 



where functions a£<r(2/o; #) are constructed recursively, as in lemmas 
/ is twice differentiable with respect to 9, functions a and consequently E% will also 
be twice differentiable with respect to 9. Thus, we can write 



E r r {9) - e;( 



7 0) 



D V T (9). )T (9-9 Q ), W68C 



for some 9 within a ball of center 9q and radius \\9 — Q || and the function D^{9) is 
continuous. By inverting and for 9 = 9^ N , we get 

\VfoV \-l ( t^V fnV 



a r,N 



0o 



D 



r,NJ 



E 



'r,NJ 



EY(6 q ) 



(26) 



where E? ' (9) = {E T r {9)} reV . By definition 



1 - 

i=l 



(27) 



where Yor(k't) are independent realizations of the random variable Y 0) t- Suppose 
that T is small enough, so that the above Monte-Carlo approximation satisfies both 
the Law of Large Numbers and the Central Limit Theorem, i.e. the covariance matrix 
satisfies < ||£y(0o)|| < oo. Then, for N — > oo 



E r(®r,N) ~ E T (9 )\ = \E, 



t/oV 



J r,N) 



E (YJ T ) | -> 0, WeV 



with probability 1. Note that the convergence does not depend on r. Also, for r — > oo 

e;(9 ) -> E T {0 Q ) 



as a result of theorem 2.6 Thus, for r oc log N 



\E 



E;(9o)\ -> 0, with probability 1, Vr £ V. 
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Combining this with (26) and the uniform non- degeneracy of , we get (23). From 
(23) and the continuity and uniform non-degeneracy of , we conclude that 



D^(6 )D^(e^ N )- 1 /, with probability 1 
provided that T is small enough, so that E v (9o) < oo. Now, since 



to prove ( 24 ) it is sufficient to prove that 

v / iVS y (0 o )- 1 / 2 (Ej{9l N ) - J^(0„)) 4 A/" (0, J) 



It follows directly from (27) that 

v/iVSv^or 172 (Vfe) -£ V (0 O )) A M (0, J) . 
It remains to show that 

v / iVS y (^ )- 1/2 (£$T(0 O ) - ^ y (^o)) -> 0. 

It follows from theorem 12.61 that 

\\EV(9 )-E v (9 )\\<Cp- r 

for any p > 1 and sufficiently small T. The constant C depends on V,p and T. 
Suppose that r = a log iV for some a > and choose p > exp (^). Then 

V0\T|| (£ r V (0 o ) - E v (9 )) || < Civ(^ clogp ) 

which proves the claim. □ 

Remark 3.8. We have now completed the discussion of the questions set in section^- 
we provided a way for getting an analytic expression for an approximation of E T {9). 
Also, the asymptotic variance of the estimator can be used to compare different choices 
of V and to assess the quality of the estimator. 

In the case of diffusions and the GMM estimator, a discussion on how to optimally 
choose which moments to work on can be found in J^j. 



4 Examples 

In this section, we use the ESME in a specific example of a diffusion and a fractional 
diffusion. 
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4.1 Diffusions 

First, we apply the ESME to estimate the parameters of the following Stratonovich 
SDE: 

dY t = a(l - Y t )dx[ l) + bY 2 dXf\ Y W = 0, (28) 

where X { t 1] = t and xf ] = W t . We chose an SDE because the expected signature of 
(t, Wt) can easily be computed explicitly. 

After three Picard iterations and replacing the expected signature of (t, W t ) by its 
value (see [12]), we get 



E Y 3 1 = at 1 2 + —t 3 t 4 1 5 

v v )o, t l 2 6 4 10 



Tn/^/oxfiiH 22 ^ 7a 4 4 ,a 5 7a 4 6 2 ,a G 17a 5 6\ fi 191a 6 6 2 7 

E 2Y 3 U 1 ; 1 ' = a 2 t 2 -a 3 t 3 + 1 4 t 5 + t 6 + 1 7 

v v ; 12 v 6 10 ; V 36 20 ; 420 

+ ( + )t 8 + ( )t 9 + 1 10 + t u . 

v 105 80 ; v 144 180 ' 700 50 

This gives us an approximation of the moments of the solution as polynomials of the 
parameters. 

The empirical moments are computed from the data. We generate 2000 approxi- 
mate realizations of paths of the solution using Milstein's method with discretization 
step 0.001. We use these paths to approximate the iterated integrals over the interval 
[0, |]. We use the values a = 1 and 6 = 2. Then, we get an approximation to the 
empirical moments at T = | by averaging the different realizations of the iterated 
integrals of Yr 0j ii. 

Finally, by equating the empirical and theoretical approximations to the moments 
for t = |, we get a system of polynomials of (a, b) of degree 14. We get two exact real 
solutions to this sytem: (.996353, -2.12892) and (.996353, 2.12892). As expected, the 
sign of b cannot be identified. Apart from that, the estimates are very close to the 
true values. 

We repeat this process 100 times and get 100 different estimates of (a, b). In figure 



4.1[ we show the positive solutions. We also check asymptotic normality: we normalize 
the 100 realizations by the asymptotic variance (25), where D^(8)i jT and IV(#o)t,t' 



in ( 22 ) are computed, the first using approximation of the theoretical moments from 



Picard iterations and the second is computed from the data by Monte Carlo. The 



normalized estimates are shown in figure 4.1| Their covariance matrix is 



0.97172 0.0243445 
0.0243445 0.954654 



which is very close to the identity. 



4.2 Fractional Diffusions 



We now apply the ESME to estimate the parameters of the differential equation 
driven by fractional diffusion with Hurst parameter h > 1/4. We choose the same 
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0.990 0.992 0.994 0.996 0.998 



Figure 1: 100 realizations of the expected signature matching estimator. True pa- 
rameters are a = 1 and 6 = 2. 



.1 * 2 



. i 

-2 - 



Figure 2: 100 realizations of the expected signature matching estimator, after center- 
ing and normalizing by the asymptotic variance. 
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vector field as before. Let 

dY t = a(l - Y t )dxi 1] + bY?dX$ 2 \ Y {1} = 0, (29) 

where x[ 1 ^ = t and = B^, where is fractional Brownian motion with Hurst 
parameter h. Fractional Brownian motion generalizes Brownian motion, in the sense 
that it is a self-similar Guassian process. It is defined as the Gaussian process with 
correlation given by 

E (B^B^) = - (\s\ 2h + \t\ 2h -\t- s\ 2h ) . 

Clearly, for h = | we get independent intervals and Brownian motion. For h > | 
the intervals are positively correlated and "smoother" than Brownian motion while 
for h < g they are negatively correlated and they get more and more "rough" as h 
gets smaller. In particular, the paths of fractional Brownian motion possess finite 
p- variation for every P > t- 

Defining integration with respect to fractional Brownian motion is necessary in 



order for (29) to make sense. This is non-trivial and it is a very active area of 
research. One of the most successful approach is given by rough paths - but it is 
limited to h > \ (see [17] or [25] for a more recent approach), i.e. to paths of finite 
p- variation for p < 4. 



Having defined (29) as a differential equation driven by the rough path (t,B£), wc 
can proceed to estimate the parameters a and b. As in the diffusion case, we first 
construct an approximation to the theoretical moments, using Picard iterations. One 
difference is that up to this moment, an analytic expression for the expected signature 
is not known. Instead, we get a numerical approximation by simulating many paths 
of fractional Brownian motion, computing their iterated integral and then averaging. 

We need to set some parameters: we choose T = | as before and h = ||. The Hurst 
parameter h is chosen so that the paths are more rough than Brownian motion but 
not too much - we will see later that the smaller the h, the smaller the discretization 
step needs to be in order for the simulation of the paths to be good. We use 1000 
paths of fractional Brownian motion with Hurst parameter h = |^ - these are exact 
simulations with discretization step 10~ 3 - to compute the iterated integrals appearing 
in the Picard iteration and then average to approximate their expectations. We get 
the following formulas for the theoretical approximation of the first two moments of 
the response Y: 
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2.20 h 



2.15 



... • • • 2.10 - 



2.05 - 



Figure 3: 100 realizations of the expected signature matching estimator. True pa- 
rameters are a = 1 and 6 = 2. 



E(Y(3); 1) 1 ) = 0.25a - 0.03125a 2 + 0.00260417a 3 + 0.00044726a 2 6 - 0.000111815a 3 6 

+4.97138 x 10~ 6 a 4 6 + 0.00116494a 3 6 2 - 0.0001159 53a 4 6 2 + 2.53 6 76 x 10~ 6 a 4 6 3 
E(2Y(3)J 1 1 i 1) ) = 0.0625a 2 - 0.015625a 3 + 0.00227865a 4 - 0.00016276a 5 + 6.78168 x 10" 6 a 6 

+0.00022363a 3 6 - 0.0000838612a 4 6 + 0.0000118036a 5 6 - 8.93081 x 10" 7 a 6 6 
+2.58926 x 10~V& + 0.000814373a 4 6 2 - 0.000246738a 5 6 2 + 0.00 3 3 84a 6 6 2 
-1.92279 x 10~ 6 a 7 6 2 + 3.27969 x 10 ~ 8 a 8 6 2 + 4.394 19 x 10~ 6 a 5 6 3 
-1.24474 x 10~ 6 a 6 & 3 + 1.21202 x 10~ 7 a 7 & 3 - 3.26456 x 10~ 9 a 8 o 3 
+5.74363 x 10~ 6 a 6 & 4 - 1.31226 x 10~ 6 a 7 6 4 + 6.56898 x 10~ 8 a 8 o 4 
+ 1.3868 x 10-V& 5 - 1.39803 x 10~V& 5 + 8.47574 x 10~ 9 a 8 6 6 



We create the data by numerically simulating 2000 paths of the solution of (29) for 
h = in,a = l and 6 = 2 and descretization step 8 = 10 -3 . We use a method proposed 
by Davie that is the equivalent of Milstein's method for differential equations driven 
by fractional Brownian motion (see [27] and references within). The error is of order 
o" 3h_1 , which for our choices of discretization step 5 and Hurst parameter h is 0.075. 

Finally, we match the theoretical moments that are polynomials of (a, 6) with the 
empirical moments and solve the system. As in the diffusion case, we get two solutions 
corresponding to b positive or negative. Since fractional Brownian motion is mean 
zero Gaussian process, we cannot expect to identify the sign of 6. 

We repeat the process 100 times to get 100 realizations of the estimates. These are 
shown in figure [3j Also, figure [4] shows the estimates ater centering and normalizing 
by the asymptotic variance. 
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Figure 4: 100 realizations of the expected signature matching estimator, after center- 
ing and normalizing by the asymptotic variance. 
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